Search CORE

335 research outputs found

Information Access Using Neural Networks For Diverse Domains And Sources

Author: Xie Yuqing
Publication venue: 'University of Waterloo'
Publication date: 30/08/2023
Field of study

The ever-increasing volume of web-based documents poses a challenge in efficiently accessing specialized knowledge from domain-specific sources, requiring a profound understanding of the domain and substantial comprehension effort. Although natural language technologies, such as information retrieval and machine reading compression systems, offer rapid and accurate information retrieval, their performance in specific domains is hindered by training on general domain datasets. Creating domain-specific training datasets, while effective, is time-consuming, expensive, and heavily reliant on domain experts. This thesis presents a comprehensive exploration of efficient technologies to address the challenge of information access in specific domains, focusing on retrieval-based systems encompassing question answering and ranking. We begin with a comprehensive introduction to the information access system. We demonstrated the structure of a information access system through a typical open-domain question-answering task. We outline its two major components: retrieval and reader models, and the design choice for each part. We focus on mainly three points: 1) the design choice of the connection of the two components. 2) the trade-off associated with the retrieval model and the best frontier in practice. 3) a data augmentation method to adapt the reader model, trained initially on closed-domain datasets, to effectively answer questions in the retrieval-based setting. Subsequently, we discuss various methods enabling system adaptation to specific domains. Transfer learning techniques are presented, including generation as data augmentation, further pre-training, and progressive domain-clustered training. We also present a novel zero-shot re-ranking method inspired by the compression-based distance. We summarize the conclusions and findings gathered from the experiments. Moreover, the exploration extends to retrieval-based systems beyond textual corpora. We explored the search system for an e-commerce database, wherein natural language queries are combined with user preference data to facilitate the retrieval of relevant products. To address the challenges, including noisy labels and cold start problems, for the retrieval-based e-commerce ranking system, we enhanced model training through cascaded training and adversarial sample weighting. Another scenario we investigated is the search system in the math domain, characterized by the unique role of formulas and distinct features compared to textual searches. We tackle the math related search problem by combining neural ranking models with structual optimized algorithms. Finally, we summarize the research findings and future research directions

University of Waterloo's Institutional Repository

An Analysis of the profitability of commercial bank in China

Author: xie yuqing
Publication venue
Publication date
Field of study

The profitability of the Chinese banking industry is affected by many determinants. These factors include variables within each bank and several important macro variables that affect performance of commercial banks. The reasons for the impact of bank profitability may be related to government policies, we should take it seriously. Based on the annual data of 124 commercial banks in China in 2013-2018, we selected 9 variables include return on average assets (ROAA), capital adequacy ratio (EQTA), insolvency risk (Z-Score), bank size (TA), liquidity (NLTA), asset quality (LLRGL), cost efficiency (CTI), inflation rate (INF) and GDP growth rate (GDPGR). Using multi-collinearity, endogeneity test and GMM model to analyze the profitability of Chinese banking industry, it is concluded that the return on average assets of Chinese banking industry is positively correlated with GDP growth rate and insolvency risk and negatively correlated with bank size and cost efficiency. From the aspects of optimizing the Chinese banking business model, the government's policy towards on commercial banks and establishing a comprehensive financial supervision system to gives suggestions for increase the profitability of bank

Nottingham ePrints

Ultra-compact silicon nitride grating coupler for microscopy systems

Author: Brainis Edouard
Jiao Yuqing
Li Yanlu
Tian Bin
Van Thourhout Dries
Wang Jie
Xie Weiqiang
Zhu Yunpeng
Publication venue: 'The Optical Society'
Publication date: 01/01/2017
Field of study

Grating couplers have been widely used for coupling light between photonic chips and optical fibers. For various quantum-optics and bio-optics experiments, on the other hand, there is a need to achieve good light coupling between photonic chips and microscopy systems. Here, we propose an ultra-compact silicon nitride (SiN) grating coupler optimized for coupling light from a waveguide to a microscopy system. The grating coupler is about 4 by 2 mu m(2) in size and a 116 nm 1 dB bandwidth can be achieved theoretically. An optimized fabrication process was developed to realize suspended SiN waveguides integrated with these couplers on top of a highly reflective bottom mirror. Experimental results show that up to 53% (2.76 dB loss) of the power of the TE mode can be coupled from a suspended SiN waveguide to a microscopy system with a numerical aperture (NA) = 0.65. Simulations show this efficiency can increase up to 75% (1.25 dB loss) for NA = 0.95

Repository TU/e

Pure OAI Repository

Ghent University Academic Bibliography

Segatron: Segment-Aware Transformer for Language Modeling and Understanding

Author: Bai He
Gao Wen
Li Ming
Lin Jimmy
Shi Peng
Tan Luchen
Xie Yuqing
Xiong Kun
Publication venue
Publication date: 15/12/2020
Field of study

Transformers are powerful for sequence modeling. Nearly all state-of-the-art language models and pre-trained language models are based on the Transformer architecture. However, it distinguishes sequential tokens only with the token position index. We hypothesize that better contextual representations can be generated from the Transformer with richer positional information. To verify this, we propose a segment-aware Transformer (Segatron), by replacing the original token position encoding with a combined position encoding of paragraph, sentence, and token. We first introduce the segment-aware mechanism to Transformer-XL, which is a popular Transformer-based language model with memory extension and relative position encoding. We find that our method can further improve the Transformer-XL base model and large model, achieving 17.1 perplexity on the WikiText-103 dataset. We further investigate the pre-training masked language modeling task with Segatron. Experimental results show that BERT pre-trained with Segatron (SegaBERT) can outperform BERT with vanilla Transformer on various NLP tasks, and outperforms RoBERTa on zero-shot sentence representation learning.Comment: Accepted by AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Approximating Human-Like Few-shot Learning with GPT-based Compression

Author: Huang Cynthia
Jiang Zhiying
Li Ming
Lin Jimmy
Xie Yuqing
Publication venue
Publication date: 14/08/2023
Field of study

In this work, we conceptualize the learning process as information compression. We seek to equip generative pre-trained models with human-like learning capabilities that enable data compression during inference. We present a novel approach that utilizes the Generative Pre-trained Transformer (GPT) to approximate Kolmogorov complexity, with the aim of estimating the optimal Information Distance for few-shot learning. We first propose using GPT as a prior for lossless text compression, achieving a noteworthy compression ratio. Experiment with LLAMA2-7B backbone achieves a compression ratio of 15.5 on enwik9. We justify the pre-training objective of GPT models by demonstrating its equivalence to the compression length, and, consequently, its ability to approximate the information distance for texts. Leveraging the approximated information distance, our method allows the direct application of GPT models in quantitative text similarity measurements. Experiment results show that our method overall achieves superior performance compared to embedding and prompt baselines on challenging NLP tasks, including semantic similarity, zero and one-shot text classification, and zero-shot text ranking

arXiv.org e-Print Archive

Microneedle interventional therapy combined with cervical spine manipulation for cervicogenic dizziness

Author: Duan Junfeng
Duan Xin
Lei Huiyan
Lv Xiaoyu
Shi Yihua
Sun Jianfeng
Tao Yan
Wang Yuqing
Xie Qi
Yin Qingshui
Zhong Lijun
Publication venue: Digital Commons@Becker
Publication date: 01/01/2018
Field of study

Digital Commons@Becker